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ABSTRACT 



A technique for isolating faults in a communication network 
is described. The techniques can be utilized in high speed 
communications networks such as all-optical networks 
(AONs). The technique is distributed, requires only local 
network node information and can localize attacks for a 
variety of network applications. The technique is particu- 
larly well suited to the problem of attack propagation which 
arises in AONs. The technique finds application in a variety 
of network restoration paradigms, including but not limited 
to automatic protection switching and loopback protection 
and provides proper network operation reduced, or in some 
cases no data loss and bounded delay time regardless of the 
location of the attack or the physical span of the network. 
Since the technique is distributed, and its associated delays 
do not depend on the number of nodes in the network. Hence 
the technique avoids the computational complexity inherent 
to centralized approaches. It is thus scalable and relatively 
rapid. Furthermore, the delays in attack isolation do not 
depend on the transmission delays in the network. A network 
management system can therefore offer hard upper-bounds 
on the loss of data due to failures or attacks. Fault localiza- 
tion with centralized algorithms depends on transmission 
delays, which are proportional to the distance traversed by 
the data. Since the described techniques for fault localization 
are not dependent on centralized computations, the tech- 
niques are equally applicable to local area networks, met- 
ropolitan area networks, or wide area networks. 

16 Claims, 16 Drawing Sheets 
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FAULT ISOLATION FOR COMMUNICATION 
NETWORKS FOR ISOLATING THE SOURCE 
OF FAULTS COMPRISING ATTACKS, 
FAILURES, AND OTHER NETWORK 
PROPAGATING ERRORS 

GOVERNMENT RIGHTS 

This invention was made with government support under 
Contract No. F19628-95C-0002 awarded by the Department 
of the Air Force. The Government has certain rights in this 
invention. 

RELATED APPLICATIONS 

Not applicable. 

FIELD OF THE INVENTION 

This invention relates generally to communication net- 
works and more particularly to localizing attacks or failures 
in communications networks. 

BACKGROUND OF THE INVENTION 

As is known in the art, there is a trend to provide 
communication networks which operate with increasing 
information capacity. This trend has led to the use of 
transmission media and components capable of providing 
information over relatively large signal bandwidths. One 
type of transmission media capable of providing such band- 
widths is an optical carrier transmission media such as glass 
fibers which are also referred to as optical fibers or more 
simply fibers. 

As is also known, an all-optical network (AON) refers to 
a network which does not contain electronic processing 
components. AONs utilize all-optical switching components 
which afford network functionality and all-optical amplifi- 
cation components which counteract attenuation of the opti- 
cal signals through the network. Since AONs do not contain 
electronic processing components, AONs avoid network 
bottlenecks caused by such electronic processing elements. 

Because AONs support delivery of large amounts of 
information, there is a trend to utilize AONs in those 
network applications which require communications rates in 
the range of 1 terabit per second and greater. While network 
architectures and implementations of AONs vary, substan- 
tially all of the architectures and implementations utilize 
devices or components such as optical switches, couplers, 
filters, attenuators, circulators and amplifiers. These building 
block devices are coupled together in particular ways to 
provide the AONs having particular characteristics. 

The devices which perform switching and amplification 
of optical signals have certain drawbacks. In particular, 
owing imperfections and necessary physical tolerances asso- 
ciated with fabricating practical components, the compo- 
nents allow so-called "leakage signals'* to propagate 
between signals ports and signal paths of the devices. Ideal 
device signal paths are ideally isolated from each other. Such 
leakage signals are often referred to as "crosstalk signals" 
and components which exhibit such leakage characteristics, 
are said to have a "crosstalk" characteristic. 

The limitations in the isolation due to the physical prop- 
erties of switches and amplifiers can be exploited by a 
nefarious user. In particular, a nefarious user on one signal 
channel can affect or attack other signal channels having 
signal paths or routes which share devices with the nefarious 
user's channel. Since signals flow unchecked through the 
AON, the nefarious user may use a legitimate means of 
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accessing the network to effect a service disruption attack, 
causing a quality of service degradation or outright service 
denial. The limitations in the operating characteristics of 
optical components in AONs thus have important security 

5 ramifications. 

One important security issue for optical networks is that 
service disruption attacks can propagate through a network. 
Propagation of attacks results in the occurrence of failures in 
portions of the network beyond where the attack originated. 

10 This is in contrast to failure due to component fatigue. 
Failures due to component fatigue generally will not propa- 
gate through the network but will affect a limited number of 
nodes and components in the network. Since the mecha- 
nisms and consequences of a service disruption attack are 

15 different from those of a failure, it is necessary to provide 
different responses to attacks and failures. Thus, it is impor- 
tant to have the ability to differentiate between a failure and 
an attack and to have the ability to locate the source of an 
attack. 

20 Referring to FIG. 1, an example of an attack which 
propagates through a switch 10 and an amplifier 16 is 
shown. The switch 10 includes switch ports 10a-10a* with a 
first switch channel 12a provided between switch ports 10a 
and 10c and a second switch channel 12b provided between 

25 switch ports 10ft and 10J. The switch 10 has a finite amount 
of isolation between the first and second switch channels 
12a, 126. Owing to the finite isolation characteristics of the 
switch 10, a portion of a signal propagating along the first 
switch channel 12a can be coupled to the second switch 

30 channel 12b through a so-called "leakage" or "crosstalk" 
signal path or channel 14. Thus, a crosstalk signal 15 
propagates from the first switch channel 12a through the 
crosstalk channel 14 to the second switch channel 12b. 

35 The output of the second switch channel 12b is coupled 
through switch port lOd to an input port 16a of a two- 
channel amplifier 16. The amplifier receives a second chan- 
nel 12c at a second amplifier input port 16b. If the crosstalk 
signal 15 on channel 12b is provided having a particularly 

40 high signal level, the crosstalk signal 15 propagating in 
channel 12b of the amplifier 16 couples power from the 
signal propagating on the second amplifier channel 12c 
thereby reducing the signal level of the signal propagating 
on the channel 12c. This is referred to as a gain competition 

45 attack. It should thus be noted that a signal propagating on 
the first channel 12a can be used to affect the third channel 
12c, even though the channels 12a and 12c are routed 
through distinct components (i.e. channel 12a is routed 
through the switch 10 and channel 12c is routed through the 

50 amplifier 16). 

It should also be noted that in this particular example, the 
gain competition attack was executed via a signal inserted 
into the channel 12b via the crosstalk channel 14 existent in 
the switch 10. Thus, a user with a particularly strong signal 

55 can couple power from the signals of other uses without 
directly accessing an amplifier component. With this 
technique, a nefarious user can disrupt several users who 
share amplifiers which receive a gain competition signal 
from the nefarious user via a different component propagat- 

60 ing on the channel 12c. 

FIG. 2 illustrates one scenario for the necessity to differ- 
entiate an attack carried out by the network traffic from a 
physical failure and when it is important to be able to 
localize the source of the attack. In FIG. 2, a portion of a 

65 network includes a first network node 17a provided by a first 
element which here corresponds to a switch 10 and a second 
network node 176 provided by a second element which here 
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corresponds to a second switch 18. It should be noted that allow the network to properly recover from attacks, it is 

the nodes 17a, 176 are here shown as switches for purposes . necessary to ascertain attacks carried out by network traffic 

of illustration only and that in other embodiments, the nodes and to localize the source of these attacks. 

17a, 176 may be provided from elements other than [ D networks having relatively high data transmission 

switches. In this example, it is assumed that each of the 5 ra tes, ultrafast restoration is typically preplanned and based 

nodes 17a, 176 guards against jamming attacks by pinpoint- upon i oca i information (i.e. information local to a network 

ing any channel on which is propagating a signal having a node). The restoration route is generally stored in a memory 

signal level higher than a predetermined threshold level and device within the network nodes. This approach avoids the 

then disconnecting the channel on which the high level delays associated with dynamically computing routes once a 

signal propagates. 10 failure occurs. To utilize such a pre-planned or pre-stored 

In FIG. 2, the switch 10 includes switch ports lOa-lOd* approach, it is thus necessary to store the alternate route 
with a first switch channel 12a provided between switch information at each of the network nodes, 
ports 10a and 10c and a second switch channel 126 provided As explained above in conjunction with FIG. 2, the 
between switch ports 106 and 10a\ The switch 10 has a finite techniques for responding to signal transmission problems 
amount of isolation between the first and second switch 15 dug to a failure which occurs because of natural fatigue of 
channels 12a, 126. Channels 12a, 126 both propagate components or physical sabotage of the network are not well 
through the node 17a, which in this particular example suited to responding to signal transmission problems caused 
corresponds to the switch 10a, and both channels 12a, 126 by the signals themselves. For example, one technique for 
propagate signals having the same carrier signal wavelength. recovering from a node failure (i.e. a failure due to natural 
Owing to the finite isolation characteristics between chan- 20 fatigue of components or physical sabotage of the network) 
nels 12a, 126 in the switch 10, a portion of a signal is to reroute trarBc away from me failed node. This technique 
propagating along the first switch channel 12a can be jg used in synchronous optical networks (SONET) and 
coupled to the second switch channel 126 through a synchronous digital hierarchy (SDH) bidirectional self- 
crosstalk channel 14. Thus, the crosstalk signal 15 propa- healing rings (SHRS). In a SONET/SDH bidirectional SHR, 
gates from the first switch channel 12a through the crosstalk 25 if the traffic itself is the cause of the failure, as is the case in 
channel 14 to the second switch channel 126. the amplifier and switch attacks discussed above, then 

If an excessively powerful signal (e.g. one having a signal failures may be caused throughout the network without any 

level equal to or greater than the predetermined threshold restoration. 

level) is introduced via switch port 10a onto channel 12a, Another technique for recovering from a failure is to 
then channel 12a will be disconnected. The crosstalk signal localize component failures. Once the failed components are 
15, however, from channel 12a is superimposed upon chan- localized, they can be physically removed from the network 
nel 126 at node 17a. If the carrier signals on the two and repaired or replaced with other components. One prob- 
channels 12a, 126 have substantially the same wavelength, km with this technique, however, is that it results in service 
the signal levels of the two carrier signals may add. Thus, the degradation or denial while the failed component or corn- 
signal propagating in channel 126, in turn, may exceed the 3 ponents are being identified and repaired or replaced, 
predeterrnined threshold signal level. Another problem with this technique is that it may take a 
The crosstalk signal 15 and the carrier signal propagating relatively long period of time before the failed component or 
on channel 126 are coupled to the second switch 18 which components can be identified and repaired or replaced, 
is provided having first and second channels 126, 12c. ^ Furthermore, since each failed component must be physi- 
Switch 18, like switch 10 has a finite amount of isolation cally located and repaired or replaced, further time delays 
between the first and second switch channels 126, 12c. can be incurred. 

Channels 126, 12c both propagate through the same node Thus, if techniques intended to respond to naturally 

176, which in this particular example corresponds to the occurring failures are applied to cases of service disruption 

switch 18. Furthermore, signals propagating on the channels 4S attacks in AONs, an attack at a single point can lead to 

126, 12c have substantially the same carrier signal wave- widespread failures within the network. It is, therefore, 

length. Owing to the finite isolation characteristic of the important to be able to ascertain whether an attack is caused 

switch 18, a portion of the signal propagating along the by traffic itself or from a failure which occurs because of 

channel 126 can be coupled to the second switch channel natural fatigue of components or physical sabotage of the 

12c through a crosstalk channel 20. Thus, the crosstalk 5Q network. 

signal 15 propagates from the first switch channel 126 p or example, assume there is an attack on a node i, which 

through the crosstalk channel 20 to the second switch carries channels 1, 2 and 3, from channel 1. If a network 

channel 126 resulting in a second crosstalk signal 21 propa- management system deals with all failures as though they 

gating on the channel 12c. were benign failures (e.g. a failure due to component 

Since the carrier signals propagating in channels 12a, 126 55 fatigue), then the network management system assumes that 

and 12c each have substantially the same wavelength, if the node i failed of its own accord and reroutes the three 

amplitude of the crosstalk signal is sufficiently large, dis- channels to some other node, say node j. After that rerouting, 

ruption of the signals propagating on the channel 12c can node j will appear as having failed because channel 2 will 

occur. attack node j. The network may then reroute all three 

In this case both nodes 17a, 176 may correctly recognize 60 channels to node k, and so on. Therefore, it is important for 

the failure as a crosstalk jamming attack. Node 17a will node i under attack to be able to recognize an attack coming 

correctly ascertain that the offending channel is channel 12a from its traffic stream and to differentiate it from a physical 

but node 176 will ascertain the offending channel as channel hardware failure which is not due to the traffic streams 

126. If the network has no means of localizing the source of traversing node i. 

the attack, then node 17a will disconnect channel 12a and 65 Attacks such as the amplifier and switch attacks discussed 

node 176 will disconnect channel 126. Channel 126 will, above can lead to service denial. The ability to use attacks 

therefore, have been erroneously disconnected. Thus, to to deny service stems from the fact that attacks can spread, 
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causing malfunctions at several locations, whereas failures FIGS. 4 and 4A, in which like elements are provided 

generally do not disrupt the operation of several devices. having like reference designations, the manner in which a 

Thus, while a single network element failure may cause single attack may lead to service disruption in the case of 

several network elements to have corrupted inputs and loopback recovery is shown. 

outputs, the failure will not generally cause other network 5 Referring briefly to FIG. 4 a portion of a network 30 

elements to be defective in their operation. includes network nodes j, k. For purposes of illustration, 

assume node j is the attack source (i.e. node j is attacked, for 

SUMMARY OF THE INVENTION instance by a nefarious user using node j as a point of entry 

In view of the above, it has been recognized that since the mt0 me network for insertion of a spurious jamming signal), 

results of component failures and attacks are often similar 10 The jamming signal causes the nodes adjacent to node j to 

(e.g. improper operation of one or more network compo- infer that node j has failed, or is "down." The same jamming 

nents or nodes), the difference is transparent to a network signal, upon traveling to node k, will cause the nodes 

node or system user. Because of this transparency there is no adjacent to node k to infer that node k has failed. If both 

absolute metric to determine whether an input is faulty or nodes j and k are considered as individual failures by a 

not. Instead, it is necessary to examine the operation of a 15 network management system, then loopback will be per- 

node, i.e., the relation between the input and the output. A formed to bypass both nodes j and k in a ring. Thus, all traffic 

failure will lead to incorrect operation of the node. An attack, which passed through both nodes j and k will be disrupted, 

as illustrated above in conjunction with FIGS. 1 and 2, can as indicated by path 31 in FIG. 4 by the loopback at each of 

cause network elements not only to have corrupted inputs the nodes j, k. 

and outputs, but the nature of those corrupted inputs can lead 20 Referring now to FIG. 4A, if node j is correctly localized 

to improper operation of the network elements themselves. as the source of the attack, then loopback effected to bypass 

Hence, if alarms are raised at individual network elements node j will lead to correct operation of the network, with 

by improper operation of the network element, a fault will only the inevitable loss of traffic which had node j as its 

lead to a single alarm. An attack, on the other hand, may lead destination or origination. Traffic which traversed node j 

to alarms in several nodes downstream (in the flow of 25 from node i is backhauled through node j. Thus, by correctly 

communications) of the first node or network point which is localizing the source of an attack, the amount of traffic which 

attacked. Thus, if a restoration scheme is prepared to recover is lost can be reduced. 

from failures but encounters instead an attack, the restora- Briefly, and in general overview, work in the area of fault 

tion scheme itself may malfunction and cause failures. ^ localization in current data networks can be summarized and 

FIGS. 3, 3A illustrate SONET/SDH approaches to recov- categorized as three different sets of fault diagnosis frame- 

ery schemes. These recovery schemes are based on rings. works: (1) fault diagnosis for computing networks; (2) 

SONET/SDH, allow for network restoration after failure probabilistic fault diagnosis by alarm correlation; and (3) 

using two techniques illustrated respectively in FIGS. 3 and fault diagnosis methods specific to AONs. 

3A. 35 The fault diagnosis framework for computing networks 

Referring now to FIG. 3 a ring 24 having network nodes covers those cases in which units communicate with subsets 
24o-24e utilizes a recovery technique typically referred to of other units for testing. In this approach, each unit is 
as automatic protection switching (APS). The APS tech- permanently either faulty or operational. The test on a unit 
nique utilizes two streams 26a, 26b which traverse physi- to determine whether it is faulty or operational is reliable 
calfy node or link disjoint paths between a source and a ^ only for operational units. Necessary and sufficient condi- 
destmation. In this particular example, stream 26a couples a tions for the testing structure for establishing each unit as 
source node 24a to a destination node 24d with information faulty or operational as long as the total number of faulty 
flowing in a clockwise direction through intermediate nodes elements is under some bound are known in the art. 
246, 24c. Stream 26b, on the other hand, couples the source Polynomial-time algorithms for identifying faults in diag- 
node 24a to destination node 24d with information flowing 45 nosable systems have been used. Instead of being able to 
in a counterclockwise direction through intermediate node determine exactly the faulty units, another approach has 
24e. In case of failure of a node or link along one of the been to determine the most likely fault set. 
streams, e.g. stream 26a, the receiving node listens to the All of the above techniques have several drawbacks. First, 
redundant, backup, stream e.g. stream 266. Such a technique they require each unit to be fixed as either faulty or opera- 
is used in the SONET unidirectional path switched ring 50 tional. Hence, sporadic attacks which may only temporarily 
(UPSR) systems. disable a unit cannot be handled by the above approaches. 

Referring now to FIG. 3A a ring 28 having network nodes Thus, the techniques are not robust. Second, the techniques 

28a-28e utilizes a recovery technique typically referred to require tests to be carefully designed and sequentially 

as loopback protection. In the loopback approach, in case of applied. Moreover, the number of tests required rises with 

a failure, a single stream 29a is rerouted onto a backup 55 the possible number of faults. Thus, it is relatively difficult 

channel 296. Such an approach is used in the SONET to scale me techniques. Third, the tests do not establish any 

bidirectional line switched ring (BLSR). type of causality among failures and thus the tests cannot 

For any node or edge redundant graph, there exists a pair establish the source of an attack by observing other attacks, 

of node or edge-disjoint paths, that can be used for APS, The techniques, therefore, do not allow network nodes to 

between any two nodes. Automatic protection switching 60 operate with only local information. Fourth, fault diagnosis 

over arbitrary redundant networks need not restrict itself to by many successive test experiments may not be rapid 

two paths between every pair of nodes, but can instead be enough to perform automatic recovery, 

performed with trees, which are more bandwidth efficient for The probabilistic fault diagnosis approaches for perform- 

multicast traffic. For loopback protection, most of the ing fault localization in networks typically utilize a Bayesian 

schemes have relied on interconnection of rings or on 65 analysis of alarms in networks. In this approach, alarms 

finding ring covers in networks. Loopback can also be from different network nodes are collected centrally and 

performed on arbitrary redundant networks. analyzed to determine the most probable failure scenario. 
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Unlike the fault diagnosis for computing networks messages using local communication between first and 

techniques, the Bayesian analysis techniques can be used to second nodes wherein, a first one of the nodes is upstream 

discover the source(s) of attacks thus enabling automatic from a second one of the nodes and wherein each of the one 

recovery. Moreover, the Bayesian analysis techniques can 0 r more messages indicates that the node transmitting the 

analyze a wide range of time-varying attacks and thus these 5 message detected an attack at the message transmitting 

techniques are relatively robust. All of the above results, nodc; and (c) proccss i ng messages received in a message 

however, assume some degree of centralized processing of processing one of the first and second nodes to determine if 

alarms, usually at the network and subnetwork level. Thus, me m processing node is first node to sustain an attack 

one problem wuh this technique is that an increase in the 0Q a certain channel ^ ^ iculaf ^ a 

size of the network leads to a concomitant increase in the 1Q ^ chni for findin me ori ^ of an attackin si j ^ 

time and complexity of the processing required to perform ^ fi ^ node status mformation at each 
fault localization 

un luwoiiz^nuii. node ^ Q ue^Q^ generating responses based on the 
Another problem with the Bayesian analysis techniques is node status m f ormat ion and the messages received by the 
that there are delays involved with propagation of the node? me technique can be used to determine whether an 
messages to the processing locations. In networks having a 1S attack ^ caused by aetwork traffic or by failurc of a network 
relatively small number of processing locations, the delays element or component In this manner, an attack on the 
are relatively small. In network's having a relatively large ne twork can be localized. By localizing the attack, the 
number of processing locations, however, the delays may be network maintains quality of service. Furthermore, while the 
relatively long and thus the Bayesian analysis techniques technique of the present invention is particularly useful for 
may be relatively slow. Thus the Bayesian analysis tech- 20 localization of propagating attacks, the technique will also 
mques may not scale well as network data rates increase or localize component failures which can be viewed as non- 
as the size of the network increases. If either the data rate or pr0 pagating attacks, tte technique can be applied to per- 
the span of network increase, there is a growth in the latency form loopback restoration as well as automatic protection 
of the network, Le. the number of bits in flight in the switching (APS). Thus, a technique provides a means for 
network. The combined increase in processing delay and in 25 utiliziDg attack localization with a loopback recovery tech- 
latency implies that many bits may be beyond the reach of niquc or an ^ tech nique to avoid unnecessary service 
corrective measures by the time attacks are detected. denial. The nodes Mude a response processor which pro- 
Therefore, an increase in network span and data rate would cesses incoming messages and local node status information 
lead to an exacerbation of the problem of insufScienfiy rapid to dete rmine the response of the node. The particular 
detection. 30 response of each node depends upon a variety of factors 
For AONs, fault diagnosis and related network manage- including but not limited to the particular type of network, 
ment issues have been considered. Some of the management me particular type of recovery scheme (e.g. loopback or 
issues for other high-speed electro-optic networks are also automatic protection switching), the particular type of net- 
applicable. The problem of spreading of fault alarms, which work application and the particular goal (e.g. raise an alarm, 
exists for several types of communication networks, is 35 reroute the node immediately before and/or after the 
exacerbated in AONs by the fact that signals flow through attacked node in the network, etc . . . ). 
AONs without being processed. To address faults only due 

to fiber failure, only the nodes adjacent to the failed fiber BRIEF DESCRIPTION OF THE DRAWINGS 
need to find out about the failure and a node need only The foregoing features of the invention, as well as the 
switch from one fiber to another. For failures which occur in ^ invention itself may be more fully understood - from the 
a chain of in-line repeaters which do not have the capability following detailed description of the drawings, in which: 
to switch from one fiber to another, one approach is when a FIG. 1 is a block diagram illustrating a network attack 
failure occurs, the alarm due to the failure is generated by the implemented through a switch and an amplifier, 
m-hne repeater immediately after the ^fa^e. The ^.2^ block ^ &hm illustrating a network attack 
failure alarm then travek down to a node which can perform 45 implemented through a pair of switches; 
failure diagnostic. The failure alarms generated downstream - . L1 , ... A . . . c 
of the firsXlure are masked by using upstream precedence. f 10 f 3 15 \ b ^ <fa ^f n ^^fratmg the results of an 
Failure localization can then be accom^hed by having the automahc protechon switehmg techmque in a nng network; 
node capable of diagnostics send messages over a supervi- t ™- 3A 15 a bbck ^agram HlustraUng the results of a 
sory channel towards the source of the failure until the 50 Section technique in a ring network; 
failure is localized and an alarm is generated at the first FIG. 4 is a block diagram illustrating a first possible result 
repeater after a failure. These techniques require diagnostic of loopback recovery when a pair of nodes detect an attack 
operations to be performed by remote nodes and to have and nodcs m Relieved to be faulty; 
two-way communications between nodes. FIG 4A is a is a block diagram illustrating a second 
It would, therefore, be desirable to provide a technique for ss P 05 ^ 1 * rcsult of loopback recovery when a pair of nodes 
stopping an attack on a signal channel by a nefarious user detcct and ^ attack source is localized; 
which does not result in service degradation or denial. It FIG 5 is a block diagram of a network; 
would also be desirable to provide a technique for localizing FIG. 6 is a flow diagram illustrating the processing steps 
an attack on a network. It would further be desirable to performed by nodes in a network to perform attack local- 
provide a relatively robust, scalable techmque which local- 60 ization; 

izes rapidly the source of an attack in a network and allows FIG. 7 is a flow diagram illustrating the processing steps 

rapid, automatic recovery in the network. performed by nodes in a network to ascertain a fault type and 

In accordance with the present invention, a distributed transmit the fauk type to adjacent nodes in the network; 

method for performing attack localization in a network FIG. 7A is a flow diagram illustrating the processing steps 

having a plurality of nodes includes the steps of (a) 65 performed to determine whether a node is the source of an 

determining, at each of the plurality of nodes in the network, attack or whether the source of the attack is an upstream 

if there is an attack on the node; (b) transmitting one or more node; 
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FIG. 8 is a block diagram illustrating a propagating attack describe a hardware failure in a network and in particular a 

which does not disrupt all nodes in a channel; hardware failure of a network element. 

FIG. 9 is a flow diagram illustrating processing steps The term "attack localization" or more simply "localiza- 

performed by nodes in a network to determine whether the tion w refers to the process by which the source of an attack 

node is the source of an attack or whether the attack is being 5 in the network is isolated. The same process can also 

carried by data from a node upstream. pinpoint other nodes in the network which may experience 

FIG. 10 is a flow diagram illustrating processing steps a failure due to an attack but which are not the source of an 

performed by nodes to utilize transmission of messages to attack. 

nodes which are upstream in a network; It should be noted that the techniques of the present 

FIG. 11 is a flow diagram illustrating processing steps 10 invention have applicability to a wide variety of different 

performed by nodes to determine whether a response should lYP es of networks and is advantageously used in those 

be an alarm response or an alert response; applications which provide relatively high-speed optical 

FIG. 12 is a block diagram illustrating automatic protec- communications. For example the techniques may be used 

tion switching performed Im accordance with the techniques 15 * ^ T™* 3 <*p Ns > ^ S0NET and ^ 

of the present invention; networks which each include network re^oration protocols. 

* „ . r « M1 . It should be noted that although SONET/SDH are not 

FIGS. 13, 13A are a series of flow diagrams illustrating all _ optical standards, the rates supported by these standards 

processing steps performed by nodes to implement loopback make meif Qced for ra ^ d restoralion commensurate 

recovery in accordance with the present invention; and ^ that of A0Ns . Thus, the techniques described herein 

FIG. 14 is a block diagram illustrating loopback protec- 20 fiod a p p ii ca t>iiity in any network having a need for rapid 

tion performed in accordance with the techniques of the service restoration. 

present invention. Referring now to FIG. 5, a network 40 includes a plurality 

DESCRIPTION OF THE PREFERRED of nodes N1-N6, generally denoted 42. Each of the nodes 42 

EMBODIMENTS 25 P rocesses a predetermined number of communication chan- 
nels (e.g., channel 48a) coupled thereto via respective ones 

Terminology °f communication links 48. Each of the nodes 42 includes a 

response processor 43 which processes incoming messages 

Before describing the apparatus and processes for per- to ^ node (InMessages) ^ local node status information 

forming fault isolation in communication networks, some to determine the response of the node 42 which receives the 

introductory concepts and terminology are explained. The * mcomina me ssaees. Each of the channels may terminate or 

term "network" as used herein refers to a collection of originate at certain nodes 42 and each channel has a specific 

assets, switching apparatus and conduits which permit the direction (i.e. signals are transmitted in a particular direction 

U-ansmission of resources. Thus, the networks may be used m cach communication channel). Thus, with respect to a 

for communications systems, data transmission systems, particu i ar channc i, nodcs can ^ rc f e rred to as being 

information systems or power systems. In one embodiment, 35 upstream Qf downstrea m 0 f one another. 

the network may be provided as an internet The resources „ , . . . , . t , , KTi 

, . . j , . , . i For example, in one communication channel the node Nl 

may be provided as optical signals such as power signals, . A , vr , j4 , , KT - . . 

information si als, etc is upstream of the node N2 and the node Nl is downstream 

^ " " ' * of the node N6. In another communication channel, 

The term "node" refers to a component or group of however, the node Nl may be downstream of the node N2 

components or a processing element or a combination or and me no d e Nl may be upstream of the node N6. Each node 

group of processing elements in a network. A source node N2 is able to detect and recognize attacks being levied 

refers to a point of origin of a message and a destination against it, receive and process messages arriving to it and 

node refers to an intended point of receipt of a message. A generate and transmit messages to nodes which are upstream 

firstnode in a series of nodes affected by an attack is referred or downstream of it on certain channels. 

to as the "source of the attack", or an "attack source" even ri , , , , t , iL A _ it _ c 

^ ui* a^oww , ui an a wwb. * u {t ^ ou \^ be no t e d that for the purposes of the present 

though the attack may have been launched at a different , . . * * , 

intin the network invention, a node may correspond to a single network 

point m e netwo . component. Alternatively a single network component may 

A "channel" refers to a combination of transmission be represented as more than one node. For example, a switch 

media and equipment capable of receiving signals at one SQ may be repre sented as several nodes, one node for each 

point and delivering the same or related signals to another switching plane of the switch. Likewise, in some appkca- 

point. A "message", refers to information exchanged nons it ^ te advantageous to represent a multichannel 

between nodes in a network. Thus messages are transmitted amplifier as a single node while in other applications it may 

between nodes on channels. be advantageous to represent the multichannel amplifier as 

The term ''failure" refers to any malfunction which (a) 55 multiple nodes. Alternatively still, a cascade of in-line 

affects the input-to-output relationship at a node of a net- amplifiers may be modeled as a single node because they 

work; or (b) leads or to imperfect or incorrect operation of have a single input and a single output, 

a network, network transmission media or a network node Mict t he techniques described herein, those of 

(e.g. a malfunction or degradation of a node or a link due to ordinary skill in the art will appreciate how to advanta- 

natural fatigue of components or physical sabotage of the 60 geous i y represent particular network components and when 

network). to represent multiple components as a single node and when 

The term "attack" refers to a process which causes a to represent a single component as a network node. In 

failure at one or more links or nodes. An attack is a process making such a determination, a variety of factors are con- 

which affects signal channels having signal paths or routes sidered including but not limited to the ability of a node or 

which share devices with a nefarious user's channel. 6 s network element or component to detect a failure (e.g. it may 

Hie term "fault" refers to a component failure of a be preferable to not represent an element as a node if the 

network element. Typically, the term "fault" is used to element can't detect a failure) and the importance in any 
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particular application of having the ability to specifically 
localize a network element (e.g. in some applications it may 
be desirable to localize an attack to a node which includes 
many elements while in other applications it may be desir- 
able or required to localize an attack to a particular element 5 
within a node). This depends, at least in part, on where the 
processing capability exists within a network. Depending 
upon the particular application, other factors may also be 
considered. 

Each of the nodes 42 has one or more inputs I, y and 1Q 
outputs with corresponding directed connections denoted 
as (i, j) when the connection is made from node i to node j 
by a link. An undirected connection between nodes i and j 
is denoted herein as [i, j]. In FIG. 5, node inputs and outputs 
are designated 49 and for simplicity and ease of description 15 
each of the nodes 42 include a single input and a single 
output generally denoted 48. The notation T 12 indicates the 
time required to transmit a message on a channel between 
nodes 1 and 2 on which channel information flows in a 
direction from node 1 to node 2. Those of ordinary skill in 2Q 
the art will appreciate, of course, that in practical networks 
many of the nodes will have multiple inputs and outputs. 

Network 40 and the networks referred to and described 
herein below are assumed to be acyclic (i.e. the particular 
communication path along which the information is trans- 25 
mitted contains no cycles). 

In general overview, the network 40 operates in accor- 
dance with the present invention in the following manner. A 
distributed processing occurs in the nodes 42 to provide a 
technique which can rapidly ascertain the one or ones of the 30 
nodes 42 are sources of an attack It should be noted that the 
nodes 42 include some processing capability including 
means for detection of failures. The ability to provide the 
nodes with such processing capability is within the skill of 
one of ordinary skill in the art. Thus, the nodes 42 can detect 35 
failures with satisfactory false positive and false negative 
probabilities. The ability to localize attacks in the network is 
provided in combination by the distributed processing which 
takes place in the network. 

The techniques of the present invention for attack local- 40 
ization are, therefore, distributed and use local communica- 
tion between nodes up- and down-stream. Each node 42 in 
the network 40 determines if it detects an attack. It then 
processes messages from neighboring nodes 42 to determine 
if the attack was passed to it or if it is the first node to sustain 45 
an attack on a certain channel. The first node affected by an 
attack is referred to as the source of the attack, even though 
the attack may have been launched elsewhere. The global 
success of localizing the attack depends upon correct mes- 
sage passing and processing at the local nodes. 50 

In describing the processing which take place at particular 
nodes, it is useful to define some terms related to the timing 
of such processing. Time delays for processing and trans- 
mission of messages at each of the nodes 42 are denoted as 
follows: 55 
T / Wtxs =measurement time for node i including time to for- 
mat and send messages (where the measurement time is 

the time required to detect an attack); 
T/^^processing time for nodes i including time to format 

and send messages (where the processing time is the time 60 

required to process received messages); and 
T t y»time to transmit a message from node i to node j on arc 

In some instances described herein below, the time delays 
at all nodes are identical and thus the measurement and 65 
processing times are denoted as I**" 1 * and V™* without 
subscripts. 
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One concept included in the present invention is the 
recognition that, in order for a node to determine whether or 
not it is the source of an attack, it need only know whether 
a node upstream of it also had the same type of attack. For 
example, suppose that node 1 is upstream of node 2 on a 
certain channel 48a which is ascertained as being an attack- 
ing channel and that both node 1 and node 2 ascertain that 
the attacking channel is channel 48a. Suppose further that 
both nodes 1 and 2 have processing times r*" and '\P roc . 
If node 1 transmits to node 2 its finding that the channel 48a 
is nefarious, then the interval between the time when the 
attack hits node 2 land node 2 receives notice from node 1 
that the attack also hit node 1 is at most T neas J since the 
attack and the message concerning the attack travel together. 

Moreover, the detection the attack commences at node 2 
as soon as the attack hits. Hence, the elapsed time from when 
the attack hits node 2 detects the attack and determines 
whether node 1 also saw that attack is T B, " M +T pn * c . It should 
be noted that this time is independent of the delay in the 
communications between nodes 1 and 2 because the attack 
and the message concerning the attack travel together, 
separated by a fixed delay. If the attack hits several nodes, 
each node only waits time i w ^ ar +T pnoc to determine whether 
or not it is the first node to detect that attack, i.e. whether it 
is the source of the attack. 

To illustrate the technique, it is useful to consider a 
relatively simple attack localization problem. In this net- 
work nodes can either have a status of 1 (O.K.) or 0 (alarm). 
Nodes monitor messages received from nodes upstream. Let 
the message be the status of the node. When an attack occurs 
in this network, the goal of the techniques set forth in 
accordance with the present invention is that the node under 
attack respond with an alarm and all other nodes respond 
with O.K. 

During the processing, once an attack is detected at a 
node, node 2 in network 40 for example, node 2 initiates 
processing to ascertain the source of the attack by transmit- 
ting its own node status to other nodes and receiving the 
status of other nodes via messages transmitted to node 2 
from the other nodes. It should be noted that the nodes from 
which node 2 receives messages may be either upstream or 
downstream nodes. In response to each of the messages 
received by node 2 which meet a predetermined criteria (e.g. 
the messages are received at node 2 within a predetermined 
period of time such as [t-T*" 71 , l±Y wjLrri ] node 2 transmits 
response messages which provide information related to the 
identity of the attack source. It should be noted that in some 
embodiments the response can be 40 ignore messages. 
Similarly, each of the nodes 42 in network to receive 
messages and in response to particular ones of the messages, 
the nodes provide information related to the identity of the 
source of the attack. The particular response messages will 
vary in accordance with a variety of factors including but not 
limited to the particular network application and side effects 
such as, loopback, re-routing and disabling of messages. 

In performing such processing, each of the nodes 42 
receives and stores information related to the other nodes in 
the network 40. Thus, the processing to localize the attack 
source is distributed throughout the network 42. 

With the above distributed approach, if node 2 is down- 
stream from node 1 and node 2 detects a crosstalk jamming 
attack on the first channel and node 2 has information 
indicating that the node 1 also had a crosstalk jamming 
attack on a second different channel, node 2 can allow node 
1 to disconnect the channel subject to the attack. Once node 
1 disconnects the channel subject to the attack, the channel 
subject to the attack at node 2 ceases to appear as an 
offending channel at node 2. 
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If node 2 did not have information from node 1 indicating 
that the channel at node 1 was subject to attack at node 1 
then node 2 infers that the attacker is attacking node 2 on the 
channel on which it detected the attack. Node 2 then 
disconnects the channel. It should be appreciated that node 
2 sees no difference between the cases where channel 1 is the 
attacker at node 1 and where channel 2 is the attacker at node 
2. In both cases, channel 2 appears as the attacker at node 2. 
Thus, by using knowledge from the operation of node 1 
upstream of node 2, node 2 can deduce whether the attack 
originated with channel 1 or channel 2 thereby avoiding the 
result of erroneously disconnecting a channel which is not 
the source of an attack. Thus, the technique of the present 
invention allows the network to recover properly from 
attacks by identifying attacks carried out by network traffic 
and localizing those attacks. 

As mentioned above, each of the nodes 42 can detect an 
attack or fault within acceptable error levels. The type of 
faults detected are included in a set of fault types denoted F 
stored within a node storage device. One of the fault types 
in F is always a status corresponding to a no fault status 
meaning that the node has not detected a fault. 

The status of a node at a time t is denoted: 

in which: 

S(i) is the current status of node i; 
t is the time to which the current node status applies; and 
F is the set of all faults to which the status must belong 
(i.e. the current node status must be a status included in 
the set of faults F) 
Considering a connection between the nodes i and j along 
the arc (i, j), a message from node i to node j at time t is 

denoted M,(i,j). Messages can be sent upstream or down- 
stream in the network 40. The upstream message from node 

j to node i at time t is denoted M/i j). For particular network 
applications the information encoded in messages varies but 
typically includes the node status information. Generally, 
however, the message can include any information useful for 
processing. It is, however, generally preferred that messages 
remain relatively small for fast transmission and processing. 
That is, each message should have a length for each appli- 
cation there is defined a particular maximum message 
length. The particular message length in any application is 
selected in accordance with a variety of factors including but 
not limited to the type of encoding in message, etc. . . . 
Moreover, the number and lengths of messages should be 
independent of network size. This allows the system to be 
scalable with respect to distance and/or number of nodes. If 
large messages based on network size are utilized, this 
results in loss of the scalability characteristic of the inven- 
tion because of long processing times which would result 
For example, a node can transmit its status upstream and 

downstream via the messages M/i j)^^) an£ * M/ij)^^). 

Message M^ij) arrives at node j at time t+T 0 and likewise 

message M/Jj) arrives at node i at time t+T iy . Again, the 

notation M/ij) and M/rj) indicates the current message 
from node i to j and node j to node i, respectively. 

A response function, R denotes processing of incoming 
messages and local status information to determine the 
response of the node which received the incoming message. 
The response function R will be discussed further below in 
the context of particular techniques implemented in accor- 
dance with the present invention. 
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In accordance with the invention, it has been recognized 
that it is necessary to explicitly take into account the time 
taken by the different processes involved in the identification 
and localization of attacks. The identification of an attack 

5 requires time for detection of the input and output signals 
and processing of the results of that detection. There is also 
delay involved in generating messages to upstream and/or 
downstream nodes. All the time required by all of the above 
processes executed in sequence is referred to as the process- 

iQ ing time at the node. Thus, the processing time at node 1 is 
denoted as T/""". Messages from node i to node j take at 
most time T,y to transmit. Message transmission follows the 
transmission of the data itself, and does not usually add to 
the overall time of localizing the attack. Lastly, there are 
delays due to the time for capturing messages from upstream 

15 and/or downstream nodes, the time to process these mes- 
sages together with local information and the time to gen- 
erate new messages. We denote the time required by this last 
set of events as T/^. 

Thus, in accordance with the present invention, a network 

20 or network management system provides techniques for: (a) 
localization of the source of an attack to enable automatic 
recovery; (b) relatively fast operation (implying near con- 
stant operational complexity); (c) scalability — the delay 
must not increase with the size and span of the network; (d) 

25 robustness — valid operation under any attack scenario 
including sporadic attacks. 

FIGS. 6-9, 10, 11, 13 and 13A are a series of flow 
diagrams which illustrate various aspects of the processing 
performed by various portions of network 40 to provide a 
communications network which utilizes a distributed tech- 
nique for performing fault isolation. The rectangular ele- 
ments (typified by element 50 in FIG. 6), herein denoted 
"processing blocks," represent computer software instruc- 
tions or groups of instructions. The diamond shaped ele- 
ments (typified by element 54 in FIG. 6), herein denoted 

35 "decision blocks/' represent computer software instructions, 
or groups of instructions which affect the execution of the 
computer software instructions represented by the process- 
ing blocks. The flow diagrams do not depict syntax of any 
particular programming language. 

40 — Alternatively, the processing and decision blocks repre- 
sent steps performed by functionally equivalent circuits such 
as a digital signal processor circuit or an application specific 
integrated circuit (ASIC), The flow diagrams do not depict 
the syntax of any particular programming or design lao- 

45 guage. Rather, the flow diagrams illustrate the functional 
information one of ordinary skill in the art requires to 
fabricate circuits or to generate computer software to per- 
form the processing required of the particular apparatus. It 
should be noted that many routine program elements, such 

50 as initialization of loops and variables and the use of 
temporary variables are not shown. 

It will be appreciated by those of ordinary skill in the art 
that unless otherwise indicated herein, the particular 
sequence of steps described is illustrative only and can be 

55 varied without departing from the spirit of the invention. 
That is, unless otherwise noted or obvious from the context, 
it is not necessary to perform particular steps in the particu- 
lar order in which they are presented hereinbelow. 

Turning now to FIG. 6, each node in a network (such as 

60 nodes 42 in network 40 described above in conjunction with 
FIG. 5) repeats the following process steps and responds 
accordingly depending upon the node status and the status 
condition indicated in the messages received by the nodes. 
Processing begins with Step 50 in which the status of a 

65 node Nl is computed at a time t Processing then proceeds 
to Step 52 where the node transmits a message including the 
node status information to nodes downstream. As shown in 
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decision block 54 if the node status is not an alarm status, process of FIG. 6. A more complex goal may be to reroute 

then processing ends. the node immediately before and after the attacked node in 

If in decision block 54 decision is made that the status is the network. The techniques of the present invention are 

an alarm status then processing flows to decision block 56 general enough to be suitable for a wide range of network 

where the node determines if any alarm messages have 5 goals. It should be noted that the particular processing steps 

arrived at the node in a pre-determined time interval. In one performed in the nodes (such as the set of faults, the format 

particular embodiment the predetermined time interval cor- of the messages and the node response to input messages) 

responds to the period of time between when the node status are defined for the particular network application and that 

of node 1 is computed, denoted as T and the measurement one of ordinary skill in the art will appreciate how to provide 

time after time T denoted as T+T****. The predetermined 10 nodes having the necessary capabilities in a particular appli- 

period of time thus corresponds to T"**". cation. 

If the node Nl has not received any alarm messages arrive The above technique thus ascertains the fault type and 

in the pre-determined time interval, then processing flows to transmits it to adjacent nodes in the network. It then moni- 

Step 58 where the node's status is set as alarm (Le. s=0) and tors incoming messages for a specified (bounded) time 

processing ends. If, on the other band, the node received an is interval and responds to these messages. The response of the 

alarm message within the pre-detennined time interval, then network is particular to the particular network application, 

processing flows to processing block 60 where the node To achieve a particular network application, a fault set F 

status is set as okay (e.g. s=4). Processing then ends. . must be defined, the waiting time interval for messages (i.e. 

From the above processing steps it can be seen that no F 3 " 1 and T^" 2 ), must be defined, the format of messages 

node will generate an alarm until at least one attack is 20 must be defined, the response function R must be defined 

detected. When an attack occurs only the first node experi- and the mode of message passing must be defined. The node 

encing the attack will respond with an alarm. All nodes can remove messages it receives from the message stream or 

downstream from the first node receive messages which pass all messages in the message stream, 

indicate that the node upstream experienced an attack. Thus, The response function R is responsible for achieving the 

nodes downstream from the attack will respond with O.K. 25 data transmission and security goals of the network. R is a 

This network response achieves the goal of attack localiza- function whose domain is the cross product of the set of 

tion. node statuses and the set of message lists and whose range 

Referring now to FIG. 7, a general processing technique is the set of message lists. This may be expressed in 

to ascertain fault types at one node and transmit the fault mathematical form as: 

types to adjacent nodes in a network begins in processing 30 

block 62 where a node i computes a node status S(i) at a time StatusxMessagcUst^Mcssagciist 

t. While computing the node status any faults in the node can m w hj cn . 

be ascertained and reflected in the status. Processing then _ , _ , 

flows to Step 64 where a response function R to be included Status corrcs P° nds to thc ** of nodc statuscs J 

in a message M is computed. The response function R is 35 MessageList corresponds to the set of message lists (0 or 

computed from the node status S(i). The node response is more messages on whatever format is being using); 

determined by the response function R which processes the x denotes a mathematical cross product; and 

node status information S(i) without regard to incoming denotes a mapping to a result space. 

messages. The response function R is preferably selected to be very 

Processing then flows -to processing block 66 where 40 fast to compute in order -to- provide a relatively rapid 

messages M which include the response function R are technique. Ideally, the response function R should be pro- 
transmitted on arcs leaving the node. In a preferred vided from one or more compare operations and table 
embodiment, the messages are transmitted on all arcs leav- lookup operations performed within a network node. With 
ing the node. Processing then flows to processing Step 68 this approach, any delay in identifying faults and attacks is 
where the node collects messages arriving at the node within 45 relatively short and the network provides minimal data loss, 
a pre-determined time interval. In a preferred embodiment, As mentioned above, messages can move upstream or 
the predetermined time interval corresponds to t-T H,<ml , downstream in the network. The response function receives 
t+T . The wait times T waul , T 1112 are selected to result in all the messages at a node as input. It processes these 
each node having equal final processing times which can be messages to generate the messages for transmission from the 
equal to the maximum time required by any of the response 50 node. The response function generates messages which the 
functions. node transmits up- and down-stream. As will be discussed 
Processing then flows to Step 70 where responses for below, the response function R can be defined to handle a 
inclusion in messages M are computed in accordance with a variety of different network recovery applications including 
pre-determined response function R selected in accordance but not limited to loopback recovery and automatic protec- 
with the node status and the messages received within the 55 tion switching (APS) recovery. 

predetermined time interval. Additional action can be taken In addition, the response function R may have a side effect 

by node S, such as switching direction of communication. response, such as raising an alarm or re-routing traffic at a 

Resulting messages are then transmitted on arcs leaving the node. 

node as shown in Step 72. In a preferred embodiment the Each node, i, in a network can have a different response 
messages are transmitted on all arcs which leave the node. 60 function denoted as R,-. The use of different response 
Processing then ends. functions, with varying processing times, may, however, 
The general processing technique, like the simple result in race conditions in the network. In general, timing 
example of attack localization discussed above in conjunc- problems due to different response functions can be avoided 
tion with FIG. 6, is a distributed algorithm that achieves its by forcing all response functions in a network to operate in 
goal through local processing and message passing. The goal 65 the same amount of time. Thus in one approach, the pro- 
of the algorithm can vary for different network examples. cessing time is set to be the maximum time required by any 
For example, the goal may be to raise an alarm as in the of the response functions. Moreover, a wait time can be 
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added to each response function such that its final processing 
time is equal to the maximum time. It should be noted that 
the response function R may return no message, or the 
empty set, in which case no messages arc transmitted. 

With reference to FIG. 7 A, the problem of basic attack 
localization discussed above in conjunction with FIG. 6 is 
re-considered. Recall that, for this problem, the nodes have 
two fault types: no fault and fault (i.e. the fault set F includes 
a value of 1 denoting no fault at the node and a value of zero 
denoting fault at the node), the status S of a node i is denoted 
S(i) and the status must be set to a value in the fault set F. 
F, and messages from any node encode the status of the 
node. The goal for node i is to determine whether it is the 
source of the attack or if the attack is being carried by the 
data from a node upstream. Each node in the network repeats 
the processing steps shown in FIG. 7 A. 

In the general technique, the waiting times are set as 
follows: r w =0 and T wm ' /2 -max l {T/ rt " 5 )- Also, the mes- 
sage passing parameter is set to remove all messages 
received. The response function R may be expressed 
described below in conjunction with FIG. 7A. In FIG. 7 A, 
it is assumed that the node inputs are the node status s and 
all messages receive within a predetermined time interval 
(denoted as InMessages): 

Turning now to FIG. 7 A, processing begins in Step 74 
where the node i generates a node status and receives all 
messages which arrive at the node in a predetermined time 
interval (denoted as InMessages). If a fault is recognized, 
then the node status will reflect this. Processing then flows 
to decision block 76 where a decision is made as to whether 
this is the response based on this node's status only or the 
status of this node together with messages received from the 
upstream node in a predetermined period of time. 

If processing this node status only, then processing flows 
to step 77 where the node status is returned and the pro- 
cessing ends. If processing received messages, then process- 
ing flows to decision block 78 where a decision is made as 
to whether the node status is a fault or a no fault status (i.e., 
fault equals 1, no fault equals 0). 

If in decision block 78 the node status received in step 74 
is not a fault status, then processing flows to decision block 
79 where a decision is made as to whether a message 
received from an upstream node j at the node i is an alarm 
message. If a decision is made that this is an alarm message, 
then processing flows to Step 82 where the node returns a 
node status value of 1. If decision is made that the message 
received from the node j is not an alarm message, then 
processing flows to Step 80 where the node returns a value 
of 0 and processing ends. 

In localizing the attack, it is useful to look at the dynamics 
between two nodes and the connection between them. Each 
node monitors every connection into it. In one relatively 
simple example, a connection between nodes i and j, with 
the data flowing from i to j is. examined. 

Defining the time at which the data leaves node i as time 
t=0, the message from node i to node j about node i's failure 
is sent at time T/**". Node j receives the data at time T,y, and 
completes the measurement and sends its status at time 
T^+T/"*". At this time node j has detected an attack or it has 
detected no attack. Node j receives the message from node 
i at time T (> +T I ' mw . Thus, node j can begin to process the 
status message from i at time T^max (T/"*", T/*""). At 
this time node j has information indicating whether or not 
node i detected an attack, and node j has enough information 
to determine whether or not it is the source of the attack. 
Processing at node j falls into one of four cases: (1) if node 
j has a detected no attack, then node j concludes that it is not 
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the source of an attack; (2) if node j has detected an attack 
and node i has detected no attack, then node j concludes that 
it is the source of the attack; (3) if node j has detected an 
attack and node i has detected an attack, then node j 

5 concludes that it is not the source of the attack; and (4) if 
node j has detected an attack and has not received any 
messages from node i at time t«max(T / m ~ 5 , T/^+T^ then 
node j concludes that it is the source of the attack. 

It should be noted that node j completes processing at a 

10 time corresponding to T jy +max(T / m ~", x £ "* ea, )+T/ we . An 

exhaustive enumeration of the possible timing constraints 

involving T/™**, T f m * a * 9 Tf™, T t/ and a length of the attack 

L, shows that node j is never owing to the technique of the 

present invention in the wrong state where the state is given 

15 with a delay, i.e. the node concludes at time t+max(T J m ' ar , 
T m^O+T/™ mat it fe thc of M attack tf and only ^ 

it is the source of an attack at time t. 

FIGS. 8 and 9, illustrate a scenario in the processing to 
localize an attack when the propagating attack does not 

20 disrupt all nodes. 

Referring now to FIG. 8 a portion of a network which 
illustrates a scenario in which the technique described in 
FIG. 9 may be used is shown. In this scenario, an attack is 
carried by a signal but the attack may not be detectable in 

25 some nodes. In this particular embodiment, consideration is 
given to a specific attack scenario due to crosstalk This 
scenario should be distinguished from a scenario in which it 
is assumed that, in the case of an attack which is carried by 
the signal, all the nodes through which the signal is trans- 

30 mitted will be affected by the attack, i.e. they will suffer a 
failure. In the case where all nodes are affected by the attack, 
the basic attack localization technique described in connec- 
tion with FIG. 7A can localize the source of such attacks. 
In the scenario where the attack is not detectable in some 

35 nodes, as the signal traverses down the network it attacks 
some nodes then reaches a node which it does attack and 
propagates through the node to attack downstream nodes. 
For example, turning now to FIG. 8 consider an attack of 
channel 86a at node 84, a switch, in the network nodes of 

40 FIG. 8 Because of the finite isolation characteristics between 
two channels propagating through the switch 84 and the 
result and crosstalk at node 84, the output of channel 866 at 
node 84 is affected by the attack. The signal in channel S6b 
then propagates to node 90, which is an amplifier. Since this 

45 signal is the only input to the node 90, gain competition is 
not possible so the node 90 does not detect an attack At node 
92, however, channel 86c is once again affected by crosstalk 
from the attack, thus an alarm is generated. The attack does 
propagate. It is detected in nodes 84 and 92, but it is not 

50 detected at intermediate node 90. 

It is thus desirable to apply the attack localization tech- 
nique of the present invention to this problem of not all 
nodes detecting an attack. To isolate the salient issues, the 
simplest framework within which this problem can occur is 

55 considered. Nodes 84, 90, 92 have two fault types. The first 
fault type is no fault (i.e. F«l) and the second fault type is 
fault (i.e., F=0). The message simply contains a status: fault 
or no fault. The goal of the technique is unchanged, node 84 
must determine whether it is the source of the attack or if the 

60 attack is being carried by the data from a source upstream. 
The difference between this problem and the basic attack 
localization problem is that each node 84, 90, 92 must know 
of the status at all the nodes upstream from it in the network, 
whereas in the basic attack localization problem it is 

65 assumed that when an attack propagates, every node in the 
network detects a fault so the status from the single preced- 
ing node contains sufficient information from which to draw 
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conclusions. Instead of generating messages at each node, e.g. node j, receives the data stream at time t+T,-, and waits 

the data is followed from its inception by a status message until time t+Ty+T""** 2 . For an all-optical network, switching 

which lags the data by a known delay. The status message is off a channel can be done in the order of nanoseconds with 

posted by the node at which the communication starts. Once an acousto-optical switch. The delay between nodes in the 

an attack is detected the status message is disabled. The lack 5 network would typically be larger, and thus it is not believed 

of a status message indicates to all the nodes downstream this condition will be problematic in practice. Moreover, the 

that the source of the attack is upstream of them. Note that network can be designed to ensure this condition is met by 

such a status message is akin to a pilot tone which indicates introducing delay at the nodes. Such a delay is easily 

that an attack or a fault has occurred. obtained by circulating the data stream through a length of 

With the above scenario in mind, one can define the fiber, 

response function, R for selective attack localization, as Response to multiple fault types can be handled efficiently 

expressed below and in conjunction with FIG. 9. It should be with a lookup table. In the case of multiple fault types, the 

noted that the processing of FIG. 9 assumes that inputs to response function R would have a pre-stored table L. Given 

each node are the status S and all messages received within the current node status, s t -, and the status of the previous 

a predetermined period of time. node, s ; -, the lookup table provides the appropriate response 

Before describing the processing steps, it should be noted 15 for this node, r f - (i.e., L: Status of node ixStatus of node 

that the nodes in the network never generate messages. They j-*Response.) For some applications it is useful to have 

can, however, disable the status message when they detect different lookup tables for the next node the network, L^, and 

an alarm. When the status message is disabled, any node the previous node in the network, L p . Furthermore, the 

downstream can conclude that it is not the origin of the look-up tables can be extended to the domain of Statusx 

attack. 20 Response which gives greater flexibility. 

In the general technique the waiting times T^ al and FIGS. 10 and 11 illustrate the processing required to 

T~ att2 are set as T™ ul °0 and T~ t2U2 =max t (J™* as ) and the repress alerts and to reduce the occurrence of alarm recov- 

message passing mode is to transmit all messages. ery. Alarm recovering refers to the steps required to route 

Referring now to FIG. 9, the response function for selec- around a node or physically send a person to fix the node 

tive attack localization is shown. Processing begins in Step 25 typically using manual techniques such a physical repair or 

94 where data at the source node of the communication is replacement of circuit components. Thus it is very expensive 

generated for transmission to the one or more destination to perform alarm recovery. 

nodes. Processing then flows to Step 96 and 98 where the Consider a node which detects signal degradation. The 
data is first transmitted to the next nodes and then a status signal may be amplified sufficiendy by the next node down- 
message is transmitted to the next nodes. It should be noted 30 stream to remain valid when it reaches the destination. Since 
that the status message lags the data message by a pre- a valid signal reaches the destination node, it may thus be 
determined amount of time. undesirable to stop transmitting the signal or to re-route the 

Processing then flows to step 100 where the data is node that detected this problem. Instead, it may be prefer- 

received at the nodes. Immediately upon receipt of the data, able to continue network operation as usual and generate an 

the node can begin processing the data and can conclude that 35 alert signal, but not an alarm signal. There thus exist three 

an attack occurred prior to processing step 102 where the possible response values: (1) node status value is no fault or 

messages are received at the nodes. It should be noted that O.IC (e.g. S=«l); (2) node status value is fault or alarm (e.g. 

the messages have a delay which is smaller than 1 war . S=0); and (3) node status value is alert. Thus, the source of 

Processing then proceeds to decision block 103 where an attack which is not corrected generates an alarm signal or 
decision is made as to whether an attack has been detected 40 alarm node status value whereas the source of a corrected 
at two nodes (i.e., the processing node and some other node). attack generates an alert signal or alert node status value. 
If decision is made that two nodes have not detected an The attack localization technique of the present invention 
attack, then processing proceeds to processing step 106 can achieve this behavior using upstream messages. Each 
where the node is determined to not be the source of the node must send status messages upstream as well as down- 
attack. Processing then flows to decision block 107 which 45 stream. Upon detecting an attack in a node downstream, 
will be described below. If, on the other hand, in decision messages are checked to determine if this node is the source 
block 103 decision is made that an attack has been detected of the attack. Upstream messages are checked to determine 
at two nodes, then processing flows to decision block 104. if the attack persists in the next node downstream. When a 

In decision block 104 it is determined if the status node detects an attack it first generates an alarm. If it later 

message is enabled. If a node determines there is an attack, 50 finds that the problem was corrected downstream it down- 

the node disables the message. If the status message has grades its alarm to an alert 

been disabled, then processing flows to processing block The response function for a network operating in accor- 

106. If, on the other hand, the decision is made that the status dance with the above concepts is described in conjunction 

message is enabled, then processing flows to step 105 where with FIGS. 10 and U. FIGS. 10 and 11 illustrate the 

the status message is disabled thus indicating that this node 55 processing which takes place at first and second nodes in a 

is the source of the attack. network. The second (node 2) is downstream from the first 

Processing then flows to decision block 107 where deci- node (node 1). FIG. 10 illustrates the processing which takes 

sion is made as to whether this node is the destination node. place at node 2 and FIG. U Illustrates the processing which 

If the node is not the destination node, then the data and the takes place at node 1. Each of the nodes repeats the 

status message (if not disabled) are transmitted to the next 60 respective processing steps described in conjunction with 

nodes as shown in processing blocks 108 and 110 and FIGS. 10 and 11. 

processing returns to Step 100. Steps 100 through 110 are Referring now to FIG. 10, processing begins in Step 112 
repeated until the destination node receives data and the where a node status is computed and proceeds to Step 114 
message. If the node is the destination node, then processing where the node status is transmitted to upstream and down- 
ends. 65 stream nodes. The processing then ends. 

Suppose a node i is attacked at time t. The node turns off In FIG. 11, processing begins in processing block 120 

the status message at time tVT^^VF^. The next node, where the a node status is computed. Processing then flows 
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to decision block 122 which determines if the node status is 
an alarm status. If in decision block 122 it is determined that 
the node status is not an alarm or a fault status, then 
processing ends. If, on the other hand, in decision block 122 
it is determined that the node status is an alarm or a fault 5 
status, then processing flows to step 124 where the node 
receives messages from a downstream node (e.g. a node 2) 
within a pre-determined period of time. Processing then 
flows to decision block 126 where decision is made as to 
whether a message from the second node (node 2) is an 1Q 
alarm message. If the message is not an alarm message, then 
processing flows to Step 130 where the response for node 1 
is indicated to be an alert signal or an alert node status value 
and processing then ends. If in decision block 126 decision 
is made that the message from node 2 is an alarm, then 
processing flows to Step 128 whether a response to node 1 15 
is an alarm signal or an alarm node status and processing 
then ends. 

Upstream messages may follow the data stream by a 
significantly longer time than do downstream messages. An 
upstream message requires time for the data to traverse a 20 
link (i, j) from node i to the next node, j. The status of node 
j must be measured, and the message from node j to node i 
must traverse the link (i, j). Therefore the waiting time, 
T~° itz in the attack localization technique is longer when 
upstream messages are monitored. In particular, for this 25 
scenario the value of the waiting time is preferably set to a 
value which takes into account such factors such as T wi,2 « 
2*max(T t> )+max 1 <T/ ,roc ). 

FIGS. 12-14 illustrate how the techniques of the present 
invention can be used for service network restoration after 30 
an attack for two important types of preplanned recovery 
schemes: (1) automatic path protection switching (APS) and 
(2) loopback protection. These two preplanned recovery 
schemes are the two types of network restoration used for 
SONET/SDH. For each network restoration, a description of 35 
how the technique can be used to perform recovery and 
provide the process steps that achieves the attack localiza- 
tion is provided. 

APS allows the network to receive data on the backup 
stream in the event of a faulty node. In the case of an attack, 40 
service would be maintained if the attack is detected. The 
location of the attack, however, is unknown and restoring 
normal network operation may require a great deal of time 
or be erroneous as discussed earlier. 

The attack localization technique described above in 45 
conjunction with FIG. 7A provides the network the required 
information to switch streams upon an attack or a fault. 
Furthermore, the attacked node is ascertained so that the 
attack can be dealt with quickly. The basic fault localization 
technique can be used to determine whether an attack took 50 
place along the primary path. 

FIG. 12 shows a network 132 having a plurality of nodes 
134 including a source node 134a and a destination node 
1346. Source and destination nodes 134a, 1346 are in 
communication via a primary path 136 provided from links 55 
136a-136d* and a backup path 138 provided from links 
138a-138dL 

If an attack took place along the primary path 136, there 
will be a message indicating the presence of such an attack 
and lagging the attack by a time T"*** traveling alongside 60 
the primary path. The destination node will therefore know 
that there was an attack upstream and that the destination 
node 134B was not the source of the attack. The response of 
the destination node 1346 will be to listen to the backup path 
or stream 138. 65 

This network requires a first response function for desti- 
nation nodes which can be denoted as Rj, and a second 



response function for all other nodes denoted as R„. The 
response function R„ can be set to the response function 
described above in conjunction with FIG. 7A. One purpose 
of the destination node response function is to determine if 
the node receives any alarm messages. If the destination 
node does not receive any alarm messages then the node 
may optionally transmit a node status message s. If the status 
of the node is an alarm message, then the destination node 
performs any necessary processes to receive data on a back 
up data stream. 

Since the attack localization technique relies on messages 
arriving at nodes at specific times, a problem may arise if the 
two different response functions do not obey these timing 
conditions. Since all nodes in the network except the des- 
tination nodes use R n , the timing up to the destination node 
will not result in a race condition. Since no node is waiting 
for messages from the destination nodes, any differences in 
time will not affect nodes in the network. 

Switching routes can entail certain routing problems. 
Such problems can be avoided by delaying transmission on 
the backup path. The necessary delay on the backup path 
may be computed as follows. Denote by T*** 6 * the time it 
takes for the destination node 1346 to switch from the 
primary path 136 to the backup path 138 after an alarm on 
the primary path 136 has been diagnosed by the destination 
node 1346. Let AT represent the difference in transmission 
delay between the source node 134a and the destination 
node 1346 between the primary stream 136 and backup 
stream 138. It is assumed that the transmission delay is 
shorter on the primary path 136 and longer on the backup 
path 138. Regardless of where the failure happened on the 
primary path 136 no data will be lost in the process of 
detecting the problem and switching to the backup stream 
138 as long as the data on the backup stream is transmitted 
with a delay of at least 

all nodes in the primary path (7>«»>r™' ifc *-A7: 

Those of ordinary skill in the art will appreciate of course 
that in some embodiments, the transmission delay is longer 
on the primary path 136 and shorter on the backup path 138 
and that appropriate changes in processing may be necessary 
owing to such a condition. 

If all nodes 134 have the same T/"~", then no matter 
where the failure occurs in the primary path 136, there is 
always the same delay between the data stream flowing on 
the primary path 136 and the data stream flowing on the 
backup data path 138 after APS. Therefore, the destination 
node 1346 need not to adapt its response to the location of 
the failure. Independence from the location of the failure is 
very advantageous for scalability of the network. Moreover, 
having a single delay for all node results in simple optical 
hardware at the destination node 1346 since adapting to 
different delays on the fly requires significant complexity at 
the destination node. 

Referring now to FIGS. 13 and 13A, response function 
processing in accordance with the techniques of the present 
invention to provide loop-back restoration in the case of a 
failure is shown. It should be noted that loop-back restora- 
tion in the case of a failure is performed by the two nodes 
adjacent to the failure. 

Processing begins in decision block 140 where decision is 
made as to whether a node has received an incoming 
message. If the node has not received an incoming message, 
then processing proceeds to Step 142 where the node posts 
a status message which indicates that the node has no 
information concerning the identity of a source attack node. 
The status message may include, for example, with a don't- 
know flag. Processing then ends. 
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If in decision block 140 decision is made that the node has the source of the attack. Node 1726 is immediately down- 
received at least one incoming message, then processing stream of node 172a. The attack may spread so that each of 
proceeds to Step 144 where each of the at least one incoming nodes 1726-172d will detect an attack while it will not be 
messages to be sent to both upstream and downstream nodes attacked directly. Each of these detected attacks will cause 
are shown. Processing then flows to decision block 146 5 loopback recovery, resulting in multiple loopback and no 
where decision is made as to whether the status included in data transmission. Thus, for these and other reasons dis- 
the message is an attack status. If a decision is made that the cusscd abovc> detection of attacks as failures and utilization 
status is not an attack status, then processing flows to of a loopback operation might not offer recov- 
decision block 148 where decision is made as to whether the ^ rom 

dowretream node detected ^ attack 1Q Jhe of ^ , fa ^ t0 

If the dowDstream node d.d not detect an attack tf.cn loopbackinth ; following 4 y . Inlheeventof ^anackeach 

processing ends. If, on the other hand, in decision block 148, & . , , . . . 

decision I made that the downstream node detected an *ode 172a-172g attempts to determine whether it is imme- 

attack, then processing flows to processing block 164 (FIG. **** ^tream or immediately downstream of the attacked 

13A) where the node immediately upstream of the attack node - Iq the network 170 > tbe node 172 S ^ < hat * ode 

node is re-routed when it receives a message having an 15 172flt 15 thc source 0 f an attack (by monitoring upstream 

indicator with a value which indicates that the node is under messages) and re-routes. The node 1726 also finds that the 

attack (e.g. the node receives an attack flag message). Thus, node 172a is the source of the attack and re-routes. All other 

the response processor causes a re-routing of the node nodes 172c-172/ find that they are not immediately 

immediately upstream of the attacked node when it receives upstream or downstream of the attack. Thus, these the nodes 

an {attack, flag} message. The upstream nodes need not wait 20 172c-172/do not re-route despite the detected attack, 

for an {attack, mine} message because the attack does not In the attack localization technique the wait time T*' m ' 2 

propagate upstream. Processing then ends. can be set to: T H "" /2 =2*max(T iy )+max i (T/" ,c> * r ) which gives 

If in decision block 146 decision is made that the status is the node time to monitor backward messages. Messages will 

an attack, then processing flows to decision block 150 where consist of the couple (s, flag) where s is the status of the node 

decision is made as to whether the node upstream detected 25 (one of O.K. or Attack), and the status flag belongs to the set 

an attack. If the node upstream did not detect an attack, status flags {DontKnow,Mine,NotMine}. The status flags 

processing flows to decision block 152 where the node posts indicate whether the transmitting node is responsible for the 

a status message which indicates that the node has informa- fault or not, or that the node does not yet know if it is 

tion concerning that identity of a source attack node. The responsible for the fault. For this case we will remove 

status message may include, for example, a mine flag 30 messages from the message stream when they are processed, 

meaning that this node is the source of the attack. If, on the The response function R is as discussed above in conjunc- 

other hand, a decision is made that the upstream node tion with FIGS. 13-13 A 

detected an attack, then processing flows to step 154 where Table 1 illustrates the messages posted at nodes 172a 

the node posts a status message which indicates that the node (node j), 1726 (node k) 172g (node i), and 172c (node 1), 

has information concerning that identity of a source attack 35 when an attack occurs at the node 172a. For simplicity it is 

node. The status message may include, for example, a assumed that all measurement are negligibly small and all 

not-mine flag meaning that this node is not the source of the transmission times are equal. This allows examination of the 

attack. nodes at discrete time steps. 

Processing then flows to decision block 156 where deci- Let the attack at node 172a occur at time t. At this time 

sion is made as to whether the node upstream is the source 40 only node 172a detects an attack. At time t+1 node 172a 

of the attack. If decision is made that the node upstream is receives an O.K. message from node V72g and finds it is the 

not the source of the attack, then processing ends. If, on the source. Node 1726 detects an attack and receives an attack 

other hand, decision is made that upstream is the source of message from node 172a indicating that it is not the source 

the attack, then processing flows to step 158 where the node of the attack. Node 172g receives the attack message from 

immediately downstream of the attack node is re-routed. 45 node 172a and re-routes. At time t+2 node 1726 finds that 

Thus, when an {Attack^Mine} message is received from the the node 172a is the source of the attack and the node 1726 

source node the response processor causes a re-routing of is the next node downstream and re-routes. The node 172c 

the node immediately downstream of the attacked node. also detects the attack at time t+2 and receives the {Attack, 

Processing then flows to step 160 where the node posts a NotMine} message from the node 1726, thereby finding that 

not-mine flag and then to step 162 where a status message 50 it is not the source. Since the message indicates that the node 

is transmitted with whatever flag has been posted. Process- 1726 is not the source, the node 172c does not re-route, 

ing then ends. The timing issues are important, since the nodes 1726, 

Referring now to FIG. 14, loopback restoration, in the 172g act independently. If tbe node 1726 performs loopback 

case of a failure, is performed by the two nodes adjacent to before the traffic on the backup channel has reached the node 

the failure. A network 170 includes a plurality of network 55 1726, there will simply be a delay in restoration but no data 

nodes 172a-172g with node, 172/" corresponding to a source will be lost. Loopback could fail, however, if the node 1726 

node and node 172a* corresponding to a destination node. A performs loopback after the loopback traffic from the node 

data stream flows between the nodes on a primary stream or 172g has arrived at the node 1726. There could be loss of 

channel 174. If node 172a experiences a failure, the primary data on the backup channel upon arrival at the node 1726. It 

data stream 174 is re-routed at node 172g to travel on a 60 can be shown, however, that this eventuality cannot occur, 

backup channel 176. Simultaneously, the node 1726 receives Let t be the time at which the attack hits the node 172a. 

information on the backup stream 176. This restoration At time t+max (T,. wajr , T/"" u )+T/ ,wc , the node 172a will 

maintains the connectivity of the ring network 170 and send a message to the node 172g mforming it that the source 

allows the data to reach the intended destination despite the of the attack is at the node 172a. The node 172g will receive 

failure at node 172a. 65 the message that the node 172a is the source of the attack at 

Considering an attack at node 172a, as shown in FIG. 14. time t+max (T/""", Tf^+T^ and will finish T i p " oc later. If 

Node 172g is immediately upstream of node 172a which is it takes the node 172g a time period corresponding to about 
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r Y i ioc P to perform loopback, then the node 172g will perform 
loopback at time t+max (J t m * as , T y w * a >T/ ree +T/ ,w,e +T tf + 
T/~*\ The node 1726 will know that it is not the source of 
the attack at time t+max (T/"", T^^+T/^+T*. 
However, the information that is needed by the node 1726 is 5 
whether or not the node 172a is the source of the attack. 

The node 1726 will know that the node 172a is the source 
of the attack and perform loopback at time t+max (T i mcas 9 
T/™^Tf™ c +T jk +T i Ioop . It may be assumed that all the 
times r t oop are equal, all of the 7"*"" are equal and all of the 10 

are equal. Such an assumption may be made without 
loss of generality because one could take the maximum of all 
these time periods and delay the others to match the maxi- 
mum. One can assume, as would be the case in AONs, that 
transmission delays are proportional to length. From 15 
elementary geometry, it is known that [T^TyJ is less than 
or equal to the transmission time from the node YTlg to the 
node 1726. Therefore, no traffic from the node 172g to the 
node 1726 placed on the backup channel by loopback will 
arrive at the node 1726 before the node 1726 has performed 20 
loopback. 
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processing messages received in a message processing 
one of the first and second nodes to determine if the 
attack was passed to the message processing node from 
another node or if the message processing node is a first 
node to sustain an attack on a certain channel. 

2. The method of claim 1 wherein prior to the step of 
transmitting the one or more messages, the method com- 
prises the step of transmitting data between said first and 
second nodes. 

3. The method of claim 2 wherein the localization of an 
attack at the message transmitting node requires a predeter- 
mined processing time at wherein the predetermined pro- 
cessing time includes: 

a detection time to detect the input and output signals and 
processing of the results of that detection, and 

a time delay associated with generating one or more 
messages for transmission to at least one of an 
upstream and a downstream node. 

4. The method of claim 3 wherein said detection time 
comprises: 

a time for capturing one or more messages from upstream 
and/or downstream nodes; and 



TABLE 1 



time 


node 172g 


node 172a 


node 172b 


node 172c 


t 

t + 1 
t + 2 
t + 3 


(OJCDontKnow) 
(OJCNotMine)* 
(O.K-^MotMiDc)* 
(CXKJMotMine)- 


(AttackJDontKnow) 
(Attack,Mine) 
(Attack,Mine) 
(Attack,Mine) 


(O.K-JJontKnow) 
(Attack, NotMine) 
(Attack,NctMine)* 
(Attack, NotMine)* 


(OiC.DontKnow) 
(OJC,DontKttow) 
(Attad^NotMinc) 
(Attack^NotMine) 



Table 1 shows messages and the side-effects of operating 
the response function R in conjunction with the processes 
and techniques described above in conjunction with FIGS. 
13, 13A when an attack occurs at time t at the node 172a. A 35 
decision to re-route is indicated with an asterisk (*). 

As indicated heretofore, aspects of this invention pertain 
to specific "method functions** implementable on computer 
systems. Those skilled in the art should readily appreciate 
that programs defining these functions can be delivered to a ^ 
computer in many forms; including, but not limited to: (a) 
information permanently stored on non-writable storage 
media (e.g., read only memory devices within a computer or 
CD-ROM disks readable by a computer I/O attachment); (b) 
information alterably stored on writable storage media (e.g., 
floppy disks and hard drives); or (c) information conveyed 45 
to a computer through communication media such as tele- 
phone networks. It should be understood, therefore, that 
such media, when carrying such information, represent 
alternate embodiments of the present invention. 

Having described preferred embodiments of the 50 
invention, it will now become apparent to one of ordinary 
skill in the art that other embodiments incorporating their 
concepts may be used. It is felt therefore that these embodi- 
ments should not be limited to disclosed embodiments, but 
rather should be limited only by the spirit and scope of the 
appended claims. 55 

What is claimed is: 

1. A method for performing attack localization in a 
network having a plurality of nodes, the method comprising 
the steps of: 

determining, at each of the plurality of nodes in the 60 
network, if there is an attack on the node; 

transmitting one or more messages between first and 
second nodes wherein said first node is upstream from 
said second node and wherein each of the one or more 
messages indicates that the node transmitting the mes- 65 
sage detected an attack at the message transmitting 
node; and 



a time to process the captured messages together with 
local information. 

5. The method of claim 4 wherein: 

the first node is upstream of the second node on a first 

channel which is an attacking channel; 
both the first and second nodes identify the attacking 

channel; 

wherein the first node transmits to a second node a finding 
that the channel is nefarious and the interval between 
the time when the attack hits the second node and the 
second node receives a message from the first node that 
the attack also hit the first node is not greater than a first 
predetermined period of time; and 

wherein the localization of the attack commences at the 
second node as soon as the attack reaches the second 
node and the elapsed time until the second node iden- 
tifies the attack and determines whether the first node 
also detected that attack is not greater than a second 
predetermined period of time. 

6. A method for processing information in a node of a 
communication network comprising the steps of: 

(a) computing a node status S of a node Ml at a time T. 

(b) transmitting a message including the node status 
information S to nodes downstream; 

(c) determining if the status information indicates an 
alarm status for the node; 

(d) in response to the status information not indicating an 
alarm status for the node, ending processing; 

(e) in response to the status information indicating an 
alarm status at the node performing the steps of: 

(1) determining if any alarm messages arrive at the 
node within a predetermined time interval; 

(2) in response to no alarm messages arriving at the 
node within the predetermined time interval, setting 
the node status of the node to alarm; and 
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(3) in response to at least one alarm message arriving 
at the node within the predetermined time interval 
setting the node status of the node to no alarm. 

7. A method for processing information in a node of a 
communications network comprising the steps of: 5 

(a) computing a status of a node at a first time; 

(b) transmitting the one or more messages including the 
node status on arcs leaving the node; 

(c) collecting all messages arriving at the node within a 1Q 
predetermined time interval; 

(d) computing at least one response to be included in at 
least one message wherein each of the at least one 
responses depends upon a node status of the node and 
information contained in the messages which arrived at 15 
the node within the predetermined time interval; and 

(e) transmitting at least one message including one of the 
at least one responses on arcs leaving the node. 

8. The method of claim 7, wherein the step of transmitting 
messages on arcs leaving the node includes the step of 20 
transmitting messages on all arcs leaving the node. 

9. The method of claim 1 wherein said message process- 
ing step comprises the steps of: 

(a) determining a status of the message processing node; 
and 25 

(b) concluding that said attack was passed to the message 
processing node if said status indicates an alarm and at 
least one message was received within a predetermined 
period of time by the message processing node from the 
message transmitting node indicating an alarm at the 30 
message transmitting node. 

10. The method of claim 9 wherein said message pro- 
cessing step further comprises the step of: 

(c) concluding the message processing node was the first 35 
node to sustain an attack on said channel if said status 
indicates an alarm and the message processing node 
does not receive a message indicating an alarm status at 
another node within the predetermined period of time. 

11. Apparatus provided at each of a plurality, of nodes of ^ 
a network for identifying the location of a fault in the 
network, said apparatus comprising: 

(a) a fault detector for detecting a fault at a respective 
node and for providing a fault status signal indicating 
whether or not said node has experienced a fault and for 45 
transmitting said fault status signal to at least one other 
node in said network; and 

(b) a response processor responsive to said fault status 
signal of the respective node and to a message received 
by said node from another node in said network within 50 
a predetermined time for updating the fault status signal 

of the respective node. 

12. The apparatus of claim 11 wherein said response 
processor is operative to provide an updated fault status 
signal indicating that the respective node is the source of the 55 
fault if the fault status signal indicates detection of a fault 
and either (a) the respective node does not receive a message 
from another node in the network within the predetermined 
period of time or (b) the respective node receives a message 
from another node in the network within the predetermined 60 
time indicating that the other node did not experience a fault. 
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13. The apparatus of claim 11 wherein said response 
processor is operative to provide an updated fault status 
signal indicating that the respective node is not the source of 
the fault if the fault status signal indicates detection of a fault 
and the respective node receives a message from another 
node in the network within the predetermined time indicat- 
ing that the other node experienced a fault, 

14. A network comprising: 

a plurality of nodes, each one comprising: 

a fault detector for providing a fault status signal 
indicative of whether or not said node has experi- 
enced a fault; and 
a response processor responsive to said fault status 
signal and to one or more messages received at said 
node from other nodes in said network for determin- 
ing whether or not said fault originated at said node 
or whether said fault was propagated by the other 
node; and 

at least one channel interconnecting the plurality of nodes. 

15. A method for detecting the source of a fault in a 
network comprising a plurality of nodes, said method com- 
prising the steps of: 

(a) generating at a source node data for transmission to a 
destination node; 

(b) transmitting said data to a next node within said 
network; 

(c) transmitting a status message from the source node to 
said next node; 

(d) receiving said data at said next node; 

(e) receiving said status message at said next node; 

(f) said next node determining whether an attack has been 
detected at said next node and said source node; 

(g) determining whether a status message of said next 
node is enabled and: 

(1) if the status message is enabled, disabling said 
status message in order to provide an indication that 
the next node is the source of the attack and trans- 
mitting the data and status message to a further next 
node if the next node is not the destination node; and 

(2) if the status message disabled, providing an indi- 
cation that the next node is not the source of the 
attack. 

16. A method for optimizing alarm recovery, comprising 
the steps of: 

computing the status of a node; 

determining whether the computed node status is an alarm 
status; 

receiving a message at the node from a downstream node 

within a predetermined period of time; 
determining whether said received message is an alarm 

message; 

providing an alarm signal if said received message is an 
alarm message and providing an alert signal if said 
received message is not an alarm message. 
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